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16, Abstract 

A sampling approach using cluster samples (segments) was used in 
LACIE to select Landsat data for classification and estimation. The 
selection of this sampling scheme was partially driven by the data 
registration technology available. As the registration of Landsat 
full frames enters the realm of current technology, sampling methods 
which utilize other than segment data should be examined. 

The objective was to assess the effect of separating the foinctions 
of sampling for training and sampling for area estimation. The frame 
selected for analysis was acquired over north central Iowa on August 9 , 
1978. A stratification of the full-frame was defined. Training data 
came from segments within the frame. 

Two classification and estimation procedures were compared; 

(1) statistics developed on one segment were used to classify that 
segment and (2) pooled statistics from the segments were used to 
classify a systematic sample of pixels. 

Comparisons to USDA/ESCS estimates illustrate that the full-frame 
sampling approach can provide accurate and precise area estimates. 
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I . INTRODUCTION 


Accurate and timely crop production information is a 
critical need in today's economy. During the past decade^ 
satellite remote sensing has been increasingly recognized as 
a means for crop identification and estimation of crop 
areas. The Landsat multispectral scanner (MSS) records as a 
single data point (pixel) a region on the ground about one 
acre (0.5 ha) in size. When estimates of crop areas are 
desired iot large regions, a statistical sampling scheme is 
required as it is not feasible to examine all of the pixels 
in the region of interest. The development of a sampling 
strategy which is both efficient and cost-effective is thus 
an important objective. 

An extensive experiment, the Large Area Crop Inventory 
Experiment (LACIE) , was conducted by NASA, the USDA, and 
NOAA from 1974 through 1977 (1) . Its data analysis 
objective was to distinguish small grains from nonsmall 
grains using Landsat MSS data. Several other investigations 
have shown that the potential exists for identification and 
area estimation of corn and soybeans as well (2, 3, 4, 5). 

The LACIE area estimation system was based on analysis 
of sample segments or cluster samples (each 5 x 6 nm in 
size) extracted from multidate Landsat data. The selection 
of this sampling scheme was driven to a large degree by the 
data registration technology which was available at that 
time. Registration technology research has made 
considerable progress toward an operational registration 
capability for Landsat MSS full frames, and so we are no 
longer restricted to sampling small geographic regions, each 
of which has been separately registered. This allows us to 
examine the sampling efficiencies which may be introduced by 
using a smaller sanipling unit size distributed over a larger 
geographic area. 

One i^uch sampling scheme, described by Bauer et al. 
(2) , separates the functions of sampling for training and 
sampling for classification and area estimation. Training 
data were developed by photointerpretation of aerial 
photography taken along north-south flightlines located at 
intervals across the area of interest. For classification 
and crop area estimation, a systematic sample of pixels 
distributed throughout the region was used. The use of 
different sampling units for training and classification 
pro^/ides both convenience for the data analyst and high 
precision of the resulting area estimates. 


ORIGINAI. PAGE !S 
OF POOR QUAury 


2 


II. OBJECTIVES 

The objective of this study was to further assess the 
effect of separating the functions of sampling for training 
and sampling for classification and area estimation. This 
approach requires ancillary data over only a small number of 
areas for training, but permits classification and crop area 
estimation over a large geographic region. Specifically, 
three related questions were addressed t 

(a) How should training statistics be developed from 
the segment data to be representative of a larger 
area? 

(b) What methods should be utilized to determine over 
what geographic region the training statistics 
apply? 

(c) How does the accuracy of area estimates differ 
when segments or a systematic sample of pixels are 
used for estimation? 


III. APPROACH 

The data set available for this study was acquired over 
the U.S. corn and soybean production region by NASA during 
the 1978 crop season. For the LACIE-type sample segments (5 
X 6 nm in size) , Landsat data included multitemporally 
registered MSS data and film writer imagery (PFC Product 1) 
for each acquisition and segment. Color infrared prints of 
aerial photography with ground inventory overlays were also 
used. For a subset of the segments, these inventories were 
also available in digital format. In addition, single-date 
Landsat MSS frames were acquired over several sites where 
segments were located. 

The Landsat frame selected for analysis was acquired 
over north central Iowa (Figure 1) on August 9, 1978, during 
the best time period for identification of corn and soybeans 
with unitemporal data (6) . Although the use of single-date 
Landsat data does ‘ not permit classification or area 
estimation accuracies as high as could be obtained using 
multitemporal data, it is expected that the relationship of 
accuracies among methods obtained’ with unitemporal data is 
the same as with multitemporal data . ^ 

The data analysis procedure consisted of first defining 
a stratification of the full-frame. The stratification 
schemes considered were: (a) using the refined strata 
developed by NASA/JSC based on agrophysical characteristics 
observable from Landsat imagery such as soil type and field 



Figure 1. Twelve-county study area 
in north-central Iowa. 
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size, (b) using a subdivision of these strata to provide 
strata with more homogeneous yields proposed for the USDA 
Ag'AISTARS yield modeling activity, and (c) modifying those 
two stratification systems* 

Sample segments with digital ground truth data located 
in the frame were used to provide training and test data. A 
modified supervised training appproach was used to develop 
statistics for each of the segments: training fields were 
selected on a systematic grid over the segment, and cover 
types were identified from ground observation data. All 
i'ields of one cover type (corn, soybeans, or "other") were 
clustered together. 

Two sampling methods were used to select data for 
classification and area estimation. The first method was 
the method used in the LACIE project: the training 
statistics developed on one segment were used to classify 
that segment. Based on the results of classifying each 
segment in this manner, an area estimate was made for each 
county in the stratum. County estimates were defined as the 
average of the segment estimates within that county, as long 
as there was at least one segment in the county; otherwise, 
a ratio of the Landsat area estimates to the 1974 
agricultural census estimates for counties with sample 
segments was used to adjust the census data for estimation 
of counties without sample segments. 

The second sampling method used to select data for 
classification and area estimation was a systematic sample 
throughout the area of interest. The pixel at every fifth 
line and column throughout each county was classified, and 
those results were used to make county area estimates. This 
provided about the same sampling density as one 5 x 6 nm 
segment per county. The classifications were conducted 
using a statistics deck pooled from the segments in the 
stratum. 

Finally, stratified area estimation (7) was used to 
make estimates of corn and soybean proportions. For county 
estimates, the pooled error matrix for all counties in the 
stratum was used. The evaluation of results was based on 
the data analysis objective of estimation of crop areas. 
Thus, the accuracy 6f proportion estimates as well as 
classification accuracy was of interest. 
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IV. RESULTS 

A. DEVELOPMENT OP TRAINING STATISTICS 

The first: objective of this study was to examine how 

training statistics should be developed from the segment 
data to best represent a stratum. To examine this 
objective, a stratum containing three counties (Emmet, Palo 
Alto, and Pocahontas) and five sample segments war selected. 
Two methods were employed for pooling statistics from the 
five segments. In the first, fields from each segment were 
clustered by cover type, and then the statistics were pooled 
across all of the segments (Training Procedure 1) . In the 
second method (Training Procedure 2) , the fields from all 
segments were first pooled and then clustered by cover type. 

The results for all of the sample segments in the 
stratum showed that higher classification accuracies were 
achieved when the training Statistics were developed on each 
segment and then pooled than when the fields were pooled by 
type before clustering (Figure 2) . This preference for 
Training Procedure 1 is again emphasized by the county 
results shown in Table 1. The area estimates for both corn 
and soybeans were closer to USDA/ESS estimates when the 
statistics were first developed on each segment separately. 
The root mean square (RMS) errors are 2.6 vs. 3.3 for corn 
and 3.7 vs. 5.8 for soybeans. 

Based on the results of this study, the remaining 
analyses discussed in this paper will use Training Procedure 


B. STRATIFICATION METHODOLOGY 

Once a method for developing training statistics had 
been defined, the next objective addressed was to define the 
geographic region to which theed training statistics could 
apply* statistical concept required here is 

stratification methodology. By the term "stratification,” 
we refer to a subdivision of the population or universe into 
subgroups, each of which is relatively homogeneous with 
respect to a variable of interest which differs from one 
subgroup to another. In defining strata to determine the 

geographic region over which a set of statistics applies, we 
want to define strata where corn "looks like" corn, and 
soybeans "look like" soybeans. We will refer to this type 
of stratification as spectral stratification . Four spectral 
stratification systems were examined: 

1. The refined strata, defined frc^) agrophysical units 
and used for allocation of sample segments in 
AgRiSTARS. 
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Figure 2. Comparison of the overall 
classification accuracies achieved using 
two training methods. Each point represents 
one sample segment. The solid line 
represents equal accuracies for the two 
methods. 
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2. A modification of the refined etratai formed by 
deleting the southernmost county. 

3. The refined split strata ^ defined as a 
substratification of the refined strata for yield 
estimation. 

4. A modification of the refined/split strata^ formed 
by deleting the county furthest south in one of the 
strata. 

These stratifications will be referred to as Stratification 
Methods I, 2, 3, and 4. 

All the counties were grouped into one stratum using 
Stratification Method 1 (Figure 1) . After development of 
statistics on a segment-by-segment basis 

for the ten sample segments in the stratumr the statistics 
decks from all segments were pooled to represent the 
stratum. The divergence between cluster classes was 
computed to determine any classes which should be pooled or 
deleted. The statistics for two segments, both in Webster 
County, were not ©%smpatible with the statistics from the 
other sample segments; for example, the mean vector of a 
class of corn in one part of the stratum was the same as for 
a class of "other" in another part of the stratum resulting 
in a divergence of zero. 

Since two segments were spectrally anomalous from the 
rest of the segments, the county in which these two segments 
fell (Webster County) , could not be considered to be in the 
same spectral stratum with the other counties. One possible 
reason for this is that Webster County has significantly 
different patterns of precipitation than the other counties. 
Since it is further south, it mfty also contain crops in 
different stages of development than the other counties. 
Thus, Webster County was deleted from the stratum to form 
Stratification Method 2. 

Stratification Method 3 divided the region of interest 
into two refined/split strata (Figure 3) . When segment 

statistics were pooled to create statistics for the eastern 
stratum, again the Webster County segments were anomalous. 
Thus, Webster County was again deleted, resulting in 

Stratification Method 4. 

The results of this analysis illustrate that neither 
the refined strata nor the refined/split strata are 
sufficient for spectral stratification. The strata are 
apparently too broad to use as Spectral strata. In defining 
spectral strata, other factors need to be taken into 
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Western Stratum 
Eastern Stratum 



Figure 3. The counties of interest 
were divided into two strata by 
Stratification Method 3, the ref ined/split 
strata. 
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accounfcr such as local weather, crop development stage, soil 
productivity, soil type, and confusion crops present. 
Further analyses were conducted using Stratification Methods 
2 and 4 only. 

C. COMPARISON OF PIXEL SAMPLES AND SEGMENT SAMPLES FOR 
CLASSIFICATION AND AREA ESTIMATION 

Three sampling schemes were compared as a basis for 
classification and crop area estimation in eleven counties: 

- Method A: estimation based on segment training and 

classification (the LACIE method). 

- Method B: estimation bas-sd on segment training and 

classification of a systematic sample of pixels 

throughout one stratum (Stratification Method 2) . 

- Method C: estimation based on segment training and 

classification of a systematic sample of pixels 

throughout two strata (Stratification Method 4) . 

Two types of accuracies were considered: classification 

accuracy and proportion estimation accuracy. Since ground 
data were available only on segments, classification 
accuracies were based on segment evaluation. Proportion 
estimation accuracy was evaluated on a county basis by 
comparison with the USDA/ESS estimates. 

Class ifi cation Accuracy . Classification accuracies 
were generally Higher on the segments when statistics 
representing that segment alone were used in the 

classification (Figure 4) . This is to be expected since 
spectral confusion classes are more likely to be present in 
the larger geographic region of the stratum. This result 
probably indicates, however, that a better spectral 
stratification still needs to be defined. 

Figure 5 compares the classification accuracies of 
Methods B and C. Most segments had higher classification 
accuracies when two strata were used. This confirms the 
px'evious hypothesis that spectral strata are somewhat 

smaller than the refined strata. 

Proportion Estimation Accuracy . The proportion 
estimates of corn and soybeans in each county are shown in 
Table 2 for each of the three stratification and sampling 
methods. Figure 6 shows the comparison between corn 
proportion estimates made by each of the three methods with 
the USDA/ESS estimate for the same county. The correlations 
between the Landsat and USDA estimates for corn are 
relatively high for all three methods (0.77 for Method A? 
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Figure 5. Overall accuracies using 
two systematic sampling methods. 
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Table 2. Proportion estinates of corn and soybeans nade using three 
different stratification and sampling methods and USDA/ESS estimates for 
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0.62 and 0.83 for Methods B and C, respectively). For 
soybeans f however, the correlations were much lower except 
for Method B (R"0.81). Methods A and C had correlations of 
0.51 and 0.09, respectively. 

Table 3 compares these estimates to the USDA/ESS 
proportion estimates by examining the root mean square (RMS) 
errors of the several methods. In the western stratum. 
Method C performed competitively with Method A (2.6 vs. 2.2) 
for corn, and both systematic sampling methods performed 
better than Method A for soybeans. In the eastern stratum, 
however. Method A performed better than Methods B and C for 
corn and better than Method C for soybeans. 

The results here indicate the potential for rising 
pooled statistics from segment data to represent a spebtral 
stratum. The results from the western stratum illustrate 
that a good spectral stratification can provide area 
estimates that are as accurate or more accurate than 
segment-based estimation. In addition, the precision of the 
estimates made from the systematic sample will be greater. 

The eastern stratum results, on the other hand, show a 
general degradation in accuracy when the systematic sample 
is utilized. We believe this is due to one of two causes: 
first, only three sample segments were available to provide 
training data for the eight counties in the stratum, so the 
spectral subclasses in the stratum may not be well- 
represented; second, the geographic extent of the eastern 
stratum (eight counties) is relatively large and may be too 
broad for a good spectral stratification. The results 
indicate that both of these potential causes may be 
contributors to, the lowered accuracy. The lack of training 
data may be a factor since the single stratum accuracy 
(eight training segments) was higher for both crops than the 
two stratum accuracy (three training segments). The 
hypothesis that the eastern stratum is too broad is based on 
the fact that neither systematic sampling method provided 
accuracies as good as the segment-based estimation method. 


V. SUMMARY 

The potential for using pooled segment statistics for 
an entire stratum is indicated by the generally good 
performance for both corn and soybeans in the western 
stratum. This type of training approach used with 
classification of a systematic sample of pixels seems to 
merit further investigation due to the variance reduction 
benefits which could be obtained. In particular, the 
potential shown for this method should be more fully 
investigated using multitemporal data which should produce 
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still higher classification accuracies and more accurate 
area estimates. 

However r a key factor in using a systematic sampling 
approach for area estimation has been found to be the 
definition of spectral strata - that region over which one 
set of training statistics can apply. It has been 
illustrated that the refined and ref ined/split strata based 
on agrophysical units are not of sufficient spatial 
resolution to provide a good spectral stratification. 
Research into the physical factors defining the strata and 
into methods of stratification will be an important task in 
the development of a full-frame sampling strategy. 
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